Control summary
Control results show how well models produce accessible code with no instructions or prompts to specifically create accessible code. Models are ranked by WCAG pass rate across 4 test cases and 25 samples per test (100 samples per model). These tests do not comprehensively test all WCAG requirements, only a subset of the most common issues. WCAG failures may still exist even for passing tests.
| Model | Rank | WCAG Pass Rate* | Avg Total WCAG Failures | Avg Axe WCAG Failures | Avg Assertion WCAG Failures | Avg Best Practice Failures |
|---|---|---|---|---|---|---|
| GPT-5.2 | 1 | 38% | 11.16 | 10.70 | 0.46 | 2.52 |
| GPT-5 Mini | 2 | 30% | 4.36 | 3.57 | 0.79 | 2.97 |
| GPT-5.2 Codex | 3 | 24% | 3.69 | 2.06 | 1.63 | 4.38 |
| Gemini 3.5 Pro Preview | 4 | 9% | 6.14 | 4.12 | 2.02 | 11.99 |
| Gemini 3 Flash Preview | 5 | 1% | 4.44 | 2.11 | 2.33 | 4.36 |
| Grok 4 Fast Non-Reasoning | 6 | 0% | 3.79 | 1.50 | 2.29 | 5.90 |
| Claude Haiku 4.5 | 7 | 0% | 9.34 | 7.09 | 2.25 | 11.90 |
| DeepSeek V3.2 | 8 | 0% | 9.77 | 7.81 | 1.96 | 4.16 |
| Claude Sonnet 4.5 | 9 | 0% | 12.17 | 9.91 | 2.26 | 15.21 |
| Claude Opus 4.6 | 10 | 0% | 18.62 | 17.19 | 1.43 | 12.05 |
Pass@k aggregates
Pass@k estimates the probability that at least one of k randomly selected samples passes. This is computed from control samples only.
| Model | Samples | Passes | pass@1 | pass@5 | pass@10 |
|---|---|---|---|---|---|
| DeepSeek V3.2 | 25 | 0 | 0% | 0% | 0% |
| Claude Haiku 4.5 | 25 | 0 | 0% | 0% | 0% |
| Claude Opus 4.6 | 25 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.5 | 25 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 25 | 0 | 0% | 0% | 0% |
| Gemini 3.5 Pro Preview | 25 | 9 | 36% | 92% | 100% |
| GPT-5 Mini | 25 | 12 | 48% | 98% | 100% |
| GPT-5.2 | 25 | 14 | 56% | 99% | 100% |
| GPT-5.2 Codex | 25 | 22 | 88% | 100% | 100% |
| Grok 4 Fast Non-Reasoning | 25 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1 | pass@5 | pass@10 |
|---|---|---|---|---|---|
| DeepSeek V3.2 | 25 | 0 | 0% | 0% | 0% |
| Claude Haiku 4.5 | 25 | 0 | 0% | 0% | 0% |
| Claude Opus 4.6 | 25 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.5 | 25 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 25 | 0 | 0% | 0% | 0% |
| Gemini 3.5 Pro Preview | 25 | 0 | 0% | 0% | 0% |
| GPT-5 Mini | 25 | 12 | 48% | 98% | 100% |
| GPT-5.2 | 25 | 4 | 16% | 62% | 89% |
| GPT-5.2 Codex | 25 | 0 | 0% | 0% | 0% |
| Grok 4 Fast Non-Reasoning | 25 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1 | pass@5 | pass@10 |
|---|---|---|---|---|---|
| DeepSeek V3.2 | 25 | 0 | 0% | 0% | 0% |
| Claude Haiku 4.5 | 25 | 0 | 0% | 0% | 0% |
| Claude Opus 4.6 | 25 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.5 | 25 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 25 | 1 | 4% | 20% | 40% |
| Gemini 3.5 Pro Preview | 25 | 0 | 0% | 0% | 0% |
| GPT-5 Mini | 25 | 1 | 4% | 20% | 40% |
| GPT-5.2 | 25 | 1 | 4% | 20% | 40% |
| GPT-5.2 Codex | 25 | 2 | 8% | 37% | 65% |
| Grok 4 Fast Non-Reasoning | 25 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1 | pass@5 | pass@10 |
|---|---|---|---|---|---|
| DeepSeek V3.2 | 25 | 0 | 0% | 0% | 0% |
| Claude Haiku 4.5 | 25 | 0 | 0% | 0% | 0% |
| Claude Opus 4.6 | 25 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.5 | 25 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 25 | 0 | 0% | 0% | 0% |
| Gemini 3.5 Pro Preview | 25 | 0 | 0% | 0% | 0% |
| GPT-5 Mini | 25 | 5 | 20% | 71% | 94% |
| GPT-5.2 | 25 | 19 | 76% | 100% | 100% |
| GPT-5.2 Codex | 25 | 0 | 0% | 0% | 0% |
| Grok 4 Fast Non-Reasoning | 25 | 0 | 0% | 0% | 0% |
Control analysis
This section summarizes where models perform well, where they struggle, and the most frequent types of accessibility issues observed across all samples.
Most common axe WCAG failures
| Rule | Impact | Failures | % of failures | Seen in models | Seen in test cases | Description |
|---|---|---|---|---|---|---|
| color-contrast | serious | 735 | 91.1% | 10 | 4 | Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds |
| link-name | serious | 30 | 3.7% | 3 | 1 | Ensure links have discernible text |
| button-name | critical | 15 | 1.9% | 2 | 1 | Ensure buttons have discernible text |
| aria-hidden-focus | serious | 10 | 1.2% | 1 | 2 | Ensure aria-hidden elements are not focusable nor contain focusable elements |
| aria-prohibited-attr | serious | 8 | 1.0% | 3 | 1 | Ensure ARIA attributes are not prohibited for an element's role |
| aria-required-children | critical | 3 | 0.4% | 1 | 2 | Ensure elements with an ARIA role that require child roles contain them |
| nested-interactive | serious | 2 | 0.2% | 1 | 1 | Ensure interactive controls are not nested as they are not always announced by screen readers or can cause focus problems for assistive technologies |
| aria-required-parent | critical | 1 | 0.1% | 1 | 1 | Ensure elements with an ARIA role that require parent roles are contained by them |
| html-has-lang | serious | 1 | 0.1% | 1 | 1 | Ensure every HTML document has a lang attribute |
| list | serious | 1 | 0.1% | 1 | 1 | Ensure that lists are structured correctly |
Most common axe best-practice failures
| Rule | Impact | Failures | % of failures | Seen in models | Seen in test cases | Description |
|---|---|---|---|---|---|---|
| region | moderate | 918 | 50.0% | 10 | 4 | Ensure all page content is contained by landmarks |
| landmark-one-main | moderate | 620 | 33.8% | 10 | 3 | Ensure the document has a main landmark |
| page-has-heading-one | moderate | 99 | 5.4% | 6 | 3 | Ensure that the page, or at least one of its frames contains a level-one heading |
| heading-order | moderate | 93 | 5.1% | 8 | 2 | Ensure the order of headings is semantically correct |
| aria-allowed-role | minor | 27 | 1.5% | 3 | 2 | Ensure role attribute has an appropriate value for the element |
| landmark-complementary-is-top-level | moderate | 24 | 1.3% | 2 | 1 | Ensure the complementary landmark or aside is at top level |
| landmark-unique | moderate | 18 | 1.0% | 3 | 3 | Ensure landmarks are unique |
| landmark-no-duplicate-banner | moderate | 15 | 0.8% | 2 | 2 | Ensure the document has at most one banner landmark |
| landmark-banner-is-top-level | moderate | 12 | 0.7% | 2 | 1 | Ensure the banner landmark is at top level |
| landmark-contentinfo-is-top-level | moderate | 5 | 0.3% | 1 | 1 | Ensure the contentinfo landmark is at top level |
Assertion-level patterns (per test case)
disclosure-widget
| Assertion | Type | Failure rate | Failures / total |
|---|---|---|---|
| All examples have a valid semantics | R | 48% | 120 / 250 |
| Collapsed content is hidden from assistive technology | R | 24% | 60 / 250 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
modal-dialog
| Assertion | Type | Failure rate | Failures / total |
|---|---|---|---|
| Each modal dialog hides content behind it while open | R | 92% | 230 / 250 |
| Each modal dialog takes focus when opened | R | 80% | 201 / 250 |
| Each dialog can be closed by escape key | BP | 66% | 164 / 250 |
| Focus is not lost when each dialog closes | R | 64% | 161 / 250 |
| Each dialog has a dialog role | R | 59% | 148 / 250 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
shopping-home-page
| Assertion | Type | Failure rate | Failures / total |
|---|---|---|---|
| Has a single maincontent | R | 63% | 157 / 250 |
| Has a single footer | R | 5% | 13 / 250 |
| Has a single banner | R | 4% | 11 / 250 |
| Has at least one navigation | R | 3% | 8 / 250 |
| Has at least one h2 | R | 2% | 5 / 250 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
simple-contact-form
| Assertion | Type | Failure rate | Failures / total |
|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 87% | 217 / 250 |
| Helper text is programmatically associated | R | 76% | 190 / 250 |
| Required fields are indicated (visually and programmatically) | R | 28% | 71 / 250 |
| Visible label is included in accessible name | R | 0% | 1 / 250 |
| Each text input has an accessible name | R | 0% | 0 / 250 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Methodology
This report shows how well various LLMs generate accessible HTML.
- Each test uses a prompt to generate HTML. The generated HTML is then tested for accessibility.
- The prompts intentionally do not include specific accessibility instructions. The goal is to see if the LLMs produce accessible HTML by default.
- The resulting HTML is rendered in a browser via Playwright (Chromium). This allows the HTML's JavaScript and CSS to execute, which can impact accessibility.
- The rendered HTML is evaluated using axe-core to identify common accessibility issues.
- A custom test script (JavaScript) is executed against the rendered page to check for accessibility requirements that are specific to the test case and not covered by axe-core. These tests look for WCAG 2.2 failures and best practices. Best practices do not impact pass/fail results.
- Each test case is run multiple times (samples) to evaluate the consistency and reliability of the LLM's output.
- The temperature is set to 0.2 for models that support temperature. This is done to reflect how most tools set the temperature when generating code (usually between 0.0 and 0.2). Some Codex-style deployments reject sampling parameters like temperature; for those models, the harness does not send a temperature value.
- Instruction sets are also evaluated to see how specific accessibility instructions impact results. See the "Instruction sets" section for details.
System Prompt
You are generating a single standalone HTML document. Do NOT wrap output in markdown fences. Include <head> and <body>. Do NOT explain the code, just output it.
All tests are automatic and deterministic (no human intervention). Only a fraction of accessibility requirements in WCAG can be covered in this way. Many requirements still need a human to evaluate. As such, these tests are not comprehensive. Even if a test passes, it may still fail WCAG and contain serious accessibility issues.
Please leave feedback, review the source code, and contribute test cases, assertions, and other improvements at the GitHub Project.
Glossary
Column Definitions
- Rank: The position of the model when sorted by WCAG Pass Rate (lower is better).
- WCAG Pass Rate: The percentage of samples that passed all WCAG tests, including both axe-core WCAG checks and custom WCAG assertions. This does not include best practices.
- Avg Total WCAG Failures: The average number of total WCAG failures (axe-core + assertions) per sample for the model. This does not include best practices.
- Avg Axe WCAG Failures: The average number of axe-core detected WCAG failures per sample for the model. This does not include best practices.
- Avg Assertion WCAG Failures: The average number of custom WCAG assertion failures per sample for the model. This does not include best practices.
- Avg Best Practice Failures: The average number of best practice accessibility issues (informational only) per sample for the model. This includes axe-core best practices and best practice assertions.
Other Glossary Terms
- Assertion: A specific accessibility check defined in the test script. Each assertion checks for a particular accessibility requirement or best practice for the specific test case which is not already tested by axe.
- Axe-core: An open-source accessibility testing engine developed by Deque Systems. It is widely used for automated accessibility testing of web applications. Axe-core
- Pass@k: A metric that estimates the likelihood of at least one sample passing a test when k samples are randomly selected.
- WCAG: Web Content Accessibility Guidelines, a set of guidelines for making web content more accessible to people with disabilities.
- Test Case: A specific scenario designed to evaluate the accessibility of generated HTML content. Each test case includes a prompt, expected accessibility requirements, and a test script.
Change Log
2/2026 Update
- Test Cases: Added a test case for a simple contact form with assertions for simple form controls. Also fixed some minor bugs in other test cases.
- Instruction Sets: Added instruction set evaluation.
- Report: Updated report layout and added new sections for instruction sets and analysis. Also allow filtering by instruction set and specific assertions within test cases.
- Temperature: Set temperature to 0.2 for all models to better reflect typical code generation settings (was previously set to the default value of 1.0). Since the results are now more deterministic, this change helps in achieving more consistent outputs, the total number of samples was also lowered.
Instruction Benchmarks (vs Control)
These results show how well each instruction set performs vs the control configuration (averaged across models). Instruction sets contain specific guidance intended to improve accessibility and are appended to the system prompt.
Several instruction sets are used in this benchmark to help identify which instructions are most effective at improving accessibility. Models are ranked by average WCAG pass rate across all models and test cases for that instruction set.
Summary (ranked by avg WCAG pass rate)
| Rank | Instruction Set | Avg Control Pass Rate | Avg Instruction Set Pass Rate | Δ Avg Pass Rate |
|---|---|---|---|---|
| 1 | 2. Detailed Instructions | 10% | 62% | +51.3pp |
| 2 | 1. Basic | 10% | 49% | +38.8pp |
| 3 | 0. Minimal | 10% | 36% | +26.3pp |
Instruction benchmark details
This section includes per-model benchmark results and the full text of each instruction set.
Instruction sets
0. Minimal
Minimal reminder that all output must be accessible.
Variant samples per (test, model): 5
All output MUST be accessible.
1. Basic
Basic reminder that all output must be accessible (includes slightly more instructions than minimal).
Variant samples per (test, model): 5
All output MUST be accessible. Use semantic HTML first; only use ARIA when necessary, and ensure full keyboard support. Conform to [WCAG 2.2 Level AA](https://www.w3.org/TR/WCAG22/).
2. Detailed Instructions
Detailed instructions for accessibility.
Variant samples per (test, model): 5
# Accessibility instructions (detailed)
You are an expert in accessibility with deep software engineering expertise.
## Non-negotiables (MUST)
- Conform to [WCAG 2.2 Level AA](https://www.w3.org/TR/WCAG22/).
- Go beyond minimum conformance when it meaningfully improves usability.
- If the project uses a UI component library, you MUST use the component patterns as defined from the library. Do not recreate patterns.
- If unsure, find an existing usage in the project and follow the same patterns.
- Ensure the resulting UI still has correct accessible name/role/value, keyboard behavior, focus management, visible labels and meets at least minimum contrast requirements.
- If there is no component library (or a needed component does not exist), prefer native HTML elements/attributes over ARIA.
- Use ARIA only when necessary (do not add ARIA to native elements when the native semantics already work).
- Ensure correct accessible **name, role, value, states, and properties**.
- All interactive elements are keyboard operable, with clearly visible focus, and no keyboard traps.
- Do not claim the output is “fully accessible”.
## Inclusive language (MUST)
- Use respectful, inclusive, people-first language in any user-facing text.
- Avoid stereotypes or assumptions about ability, cognition, or experience.
## Cognitive load (SHOULD)
- Prefer plain language.
- Use consistent page structure (landmarks).
- Keep navigation order consistent.
- Keep the interface clean and simple (avoid unnecessary distractions).
## Structure and semantics
### Page structure (MUST)
- Use landmarks (`header`, `nav`, `main`, `footer`) appropriately.
- Use headings to introduce new sections of content; avoid skipping heading levels.
- Prefer one `h1` for the page topic. Generally, the first heading within the `main` element / landmark.
### Page title (SHOULD)
- Set a descriptive `<title>`.
- Prefer: “Unique page - section - site”.
## Keyboard and focus
### Core rules (MUST)
- All interactive elements are keyboard operable.
- Tab order follows reading order and is predictable.
- Focus is always visible.
- Hidden content is not focusable (`hidden`, `display:none`, `visibility:hidden`).
- If content is hidden to assistive technology by using `aria-hidden=true` then that content, nor any of its descendants, can be focusable.
- Static content MUST NOT be tabbable.
- Exception: if an element needs programmatic focus, use `tabindex="-1"`.
### Skip link / bypass blocks (MUST)
Provide a skip link as the first focusable element.
```html
<header>
<a href="#maincontent" class="sr-only">Skip to main content</a>
<!-- header content -->
</header>
<nav>
<!-- navigation -->
</nav>
<main id="maincontent" tabindex="-1">
<h1><!-- page title --></h1>
<!-- content -->
</main>
```
```css
.sr-only:not(:focus):not(:active) {
clip: rect(0 0 0 0);
clip-path: inset(50%);
height: 1px;
overflow: hidden;
position: absolute;
white-space: nowrap;
width: 1px;
}
```
### Composite widgets (SHOULD)
If a component uses arrow-key navigation within itself (tabs, listbox, menu-like UI, grid/date picker):
- Provide one tab stop for the composite container or one child.
- Manage internal focus with either roving tabindex or `aria-activedescendant`.
Roving tabindex (SHOULD):
- Exactly one focusable item has `tabindex="0"`; all others are `-1`.
- Arrow keys move focus by swapping tabindex and calling `.focus()`.
`aria-activedescendant` (SHOULD):
- Container is implicitly focusable or has `tabindex="0"` and `aria-activedescendant="IDREF"`.
- Arrow keys update `aria-activedescendant`.
## Low vision and contrast (MUST)
### Contrast requirements (MUST)
- Text contrast: at least 4.5:1 (large text: 3:1).
- Large text is at least 24px regular or 18.66px bold.
- Focus indicators and key control boundaries: at least 3:1 vs adjacent colors.
- Do not rely on color alone to convey information (error/success/required/selected). Provide text and/or icons with accessible names.
### Color generation rules (MUST)
- Do not invent arbitrary colors.
- Use project-approved design tokens (CSS variables).
- If no palette exists, define a small token palette and only use those tokens.
- Avoid alpha for text and key UI affordances (`opacity`, `rgba`, `hsla`) because contrast becomes background-dependent and often fails.
- Ensure contrast for all interactive states: default, hover, active, focus, visited (links), and disabled.
### Safe defaults when unsure (SHOULD)
- Prefer very dark text on very light backgrounds, or the reverse.
- Avoid mid-gray text on white; muted text should still meet 4.5:1.
### Tokenized palette contract (SHOULD)
- Define and use tokens like: `--color-bg`, `--color-text`, `--color-muted-text`, `--color-link`, `--color-border`, `--color-focus`, `--color-danger`, `--color-success`.
- Only assign UI colors via these tokens (avoid scattered inline hex values).
### Verification (MUST)
Contrast verification is covered by the Final verification checklist.
## High contrast / forced colors mode (MUST)
### Support OS-level accessibility features (MUST)
- Never override or disrupt OS accessibility settings.
- The UI MUST adapt to High Contrast / Forced Colors mode automatically.
- Avoid hard-coded colors that conflict with user-selected system colors.
### Use the `forced-colors` media query when needed (SHOULD)
Use `@media (forced-colors: active)` only when system defaults are not sufficient.
```css
@media (forced-colors: active) {
/* Example: Replace box-shadow (suppressed in forced-colors) with a border */
.button {
border: 2px solid ButtonBorder;
}
}
/* if using box-shadow for a focus style, also use a transparent outline
so that the outline will render when the high contrast setting is enabled */
.btn:focus {
box-shadow: 0 0 4px 3px rgba(90, 50, 200, .7);
outline: 2px solid transparent;
}
```
In Forced Colors mode, avoid relying on:
- Box shadows
- Decorative gradients
### Respect user color schemes in forced colors (MUST)
- Use system color keywords (e.g., `ButtonText`, `ButtonBorder`, `CanvasText`, `Canvas`).
- Do not use fixed hex/RGB colors inside `@media (forced-colors: active)`.
### Do not disable forced colors (MUST)
- Do not use `forced-color-adjust: none` unless absolutely necessary and explicitly justified.
- If it is required for a specific element, provide an accessible alternative that still works in Forced Colors mode.
### Icons (MUST)
- Icons MUST adapt to text color.
- Prefer `currentColor` for SVG icon fills/strokes; avoid embedding fixed colors inside SVGs.
```css
svg {
fill: currentColor;
stroke: currentColor;
}
```
## Reflow (WCAG 2.2 SC 1.4.10) (MUST)
### Goal (MUST)
Multi-line text must be able to fit within 320px wide containers or viewports, so that users do not need to scroll in two-dimensions to read sections of content.
### Core principles (MUST)
- Preserve information and function: nothing essential is removed, obscured, or truncated.
- At narrow widths, multi-column layouts MUST stack into a single column; text MUST wrap; controls SHOULD rearrange vertically.
- Users MUST NOT need to scroll left/right to read multi-line text.
- If content is collapsed in the narrow layout, the full content/function MUST be available within 1 click (e.g., overflow menu, dialog, tooltip).
### Engineering requirements (MUST)
- Use responsive layout primitives (`flex`, `grid`) with fluid sizing; enable text wrapping.
- Avoid fixed widths that force two-dimensional scrolling at 320px.
- Avoid absolute positioning and `overflow: hidden` when it causes content loss, or would result in the obscuring of content at smaller viewport sizes.
- Media and containers SHOULD NOT overflow the viewport at 320px (for example, prefer `max-width: 100%` for images/video/canvas/iframes).
- In flex/grid layouts, ensure children can shrink/wrap (common fix: `min-width: 0` on flex/grid children).
- Handle long strings (URLs, tokens) without forcing overflow (common fix: `overflow-wrap: anywhere` or equivalent).
- Ensure all interactive elements remain visible, reachable, and operable at 320px.
### Exceptions (SHOULD)
If a component truly requires a two-dimensional layout for meaning/usage (e.g., large data tables, maps, diagrams, charts, games, presentations), allow horizontal scrolling only at the component level.
- The page as a whole MUST still reflow (unless the page layout truely requires two-dimensional layout for usage).
- The component MUST remain fully usable (all content reachable; controls operable).
## Controls and labels
### Visible labels (MUST)
- Every interactive element has a visible label.
- The label cannot disappear while entering text or after the field has a value.
### Voice access (MUST)
- The accessible name of each interactive element MUST contain the visible label.
- If using `aria-label`, include the visual label text.
- If multiple controls share the same visible label (e.g., many “Remove” buttons), use an `aria-label` that keeps the visible label text and adds context (e.g., “Remove item: Socks”).
## Forms
### Labels and help text (MUST)
- Every form control has a programmatic label.
- Prefer `<label for="...">`.
- Labels describe the input purpose.
- If help text exists, associate it with `aria-describedby`.
### Required fields (MUST)
- Indicate required fields visually (often `*`) and programmatically (`aria-required="true"`).
### Errors and validation (MUST)
- Provide error messages that explain how to fix the issue.
- Use `aria-invalid="true"` for invalid fields; remove it when valid.
- Associate inline errors with the field via `aria-describedby`.
- Submit buttons SHOULD NOT be disabled solely to prevent submission.
- On submit with invalid input, focus the first invalid control.
## Graphics and images
All graphics include `img`, `svg`, icon fonts, and emojis.
- Informative graphics MUST have meaningful alternatives.
- `img`: use `alt`.
- `svg`: prefer `role="img"` and `aria-label`/`aria-labelledby`.
- Decorative graphics MUST be hidden.
- `img`: `alt=""`.
- Other: `aria-hidden="true"`.
## Navigation and menus
- Use semantic navigation: `<nav>` with lists and links.
- Do not use `role="menu"` / `role="menubar"` for site navigation.
- For expandable navigation:
- Include button elements to toggle navigation and/or sub-navigations. Use `aria-expanded` on the button to indicate state.
- `Escape` MAY close open sub-navigations.
## Tables and grids
### Tables for static data (MUST)
- Use `<table>` for static tabular data.
- Use `<th>` to associate headers.
- Column headers are in the first row.
- Row headers (when present) use `<th>` in each row.
### Grids for dynamic UIs (SHOULD)
- Use grid roles only for truly interactive/dynamic experiences.
- If using `role="grid"`, grid cells MUST be nested in rows so header/cell relationships are determinable.
- Use arrow navigation to navigate within the grid.
## Final verification checklist (MUST)
Before finalizing output, explicitly verify:
- Structure and semantics: landmarks, headings, and one `h1` for the page topic.
- Keyboard and focus: operable controls, visible focus, predictable tab order, no traps, skip link works.
- Controls and labels: visible labels present and included in accessible names.
- Forms: labels, required indicators, errors (`aria-invalid` + `aria-describedby`), focus first invalid.
- Contrast: meets 4.5:1 / 3:1 thresholds, focus/boundaries meet 3:1, color not the only cue.
- Forced colors: does not break OS High Contrast / Forced Colors; uses system colors in `forced-colors: active`.
- Reflow: sections of content should be able to adjust to 320px width without the need for two-dimensional scrolling to read multi-line text; no content loss; controls remain operable.
- Graphics: informative alternatives; decorative graphics hidden.
- Tables/grids: tables use `<th>`; grids (when needed) are structured with rows and cells.
## Final note
Generate the HTML with accessibility in mind, but accessibility issues may still exist; manual review and testing (for example with Accessibility Insights) is still recommended.
Results
| Model | Instruction Set | Control Pass Rate | Instruction Set Pass Rate | Δ Pass Rate |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0. Minimal | 0% | 0% | +0.0pp |
| Claude Haiku 4.5 | 1. Basic | 0% | 25% | +25.0pp |
| Claude Haiku 4.5 | 2. Detailed Instructions | 0% | 35% | +35.0pp |
| Claude Opus 4.6 | 0. Minimal | 0% | 15% | +15.0pp |
| Claude Opus 4.6 | 1. Basic | 0% | 25% | +25.0pp |
| Claude Opus 4.6 | 2. Detailed Instructions | 0% | 80% | +80.0pp |
| Claude Sonnet 4.5 | 0. Minimal | 0% | 10% | +10.0pp |
| Claude Sonnet 4.5 | 1. Basic | 0% | 40% | +40.0pp |
| Claude Sonnet 4.5 | 2. Detailed Instructions | 0% | 70% | +70.0pp |
| DeepSeek V3.2 | 0. Minimal | 0% | 5% | +5.0pp |
| DeepSeek V3.2 | 1. Basic | 0% | 5% | +5.0pp |
| DeepSeek V3.2 | 2. Detailed Instructions | 0% | 20% | +20.0pp |
| GPT-5 Mini | 0. Minimal | 30% | 50% | +20.0pp |
| GPT-5 Mini | 1. Basic | 30% | 50% | +20.0pp |
| GPT-5 Mini | 2. Detailed Instructions | 30% | 70% | +40.0pp |
| GPT-5.2 | 0. Minimal | 38% | 90% | +52.0pp |
| GPT-5.2 | 1. Basic | 38% | 100% | +62.0pp |
| GPT-5.2 | 2. Detailed Instructions | 38% | 85% | +47.0pp |
| GPT-5.2 Codex | 0. Minimal | 24% | 50% | +26.0pp |
| GPT-5.2 Codex | 1. Basic | 24% | 85% | +61.0pp |
| GPT-5.2 Codex | 2. Detailed Instructions | 24% | 90% | +66.0pp |
| Gemini 3 Flash Preview | 0. Minimal | 1% | 70% | +69.0pp |
| Gemini 3 Flash Preview | 1. Basic | 1% | 80% | +79.0pp |
| Gemini 3 Flash Preview | 2. Detailed Instructions | 1% | 75% | +74.0pp |
| Gemini 3.5 Pro Preview | 0. Minimal | 9% | 65% | +56.0pp |
| Gemini 3.5 Pro Preview | 1. Basic | 9% | 80% | +71.0pp |
| Gemini 3.5 Pro Preview | 2. Detailed Instructions | 9% | 85% | +76.0pp |
| Grok 4 Fast Non-Reasoning | 0. Minimal | 0% | 10% | +10.0pp |
| Grok 4 Fast Non-Reasoning | 1. Basic | 0% | 0% | +0.0pp |
| Grok 4 Fast Non-Reasoning | 2. Detailed Instructions | 0% | 5% | +5.0pp |
Instruction set analysis vs control
This section highlights where each instruction set helped (or hurt) compared to the control, aggregated across all samples for that instruction set.
0. Minimal — overall Δ pass rate +26.3pp
Overall: Control 10% (n=1000) → Variant 36% (n=200). Avg WCAG failures/sample: 8.35 → 6.17 (Δ -2.17).
Most improved test cases
| Test case | Control pass rate | Variant pass rate | Δ pass rate | Δ avg WCAG failures |
|---|---|---|---|---|
| simple-contact-form | 10% | 44% | +34.4pp | -1.73 |
| modal-dialog | 6% | 36% | +29.6pp | -3.20 |
| disclosure-widget | 23% | 52% | +29.2pp | -0.17 |
| shopping-home-page | 2% | 14% | +12.0pp | -3.59 |
Most reduced axe WCAG rules
| Rule | Control rate | Variant rate | Δ rate | Description |
|---|---|---|---|---|
| color-contrast | 73.5% | 53.0% | -20.5pp | Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds |
| link-name | 3.0% | 1.0% | -2.0pp | Ensure links have discernible text |
| button-name | 1.5% | 0.5% | -1.0pp | Ensure buttons have discernible text |
| aria-hidden-focus | 1.0% | 0.5% | -0.5pp | Ensure aria-hidden elements are not focusable nor contain focusable elements |
| aria-required-children | 0.3% | 0.0% | -0.3pp | Ensure elements with an ARIA role that require child roles contain them |
Most increased axe WCAG rules
| Rule | Control rate | Variant rate | Δ rate | Description |
|---|---|---|---|---|
| nested-interactive | 0.2% | 0.5% | +0.3pp | Ensure interactive controls are not nested as they are not always announced by screen readers or can cause focus problems for assistive technologies |
| aria-prohibited-attr | 0.8% | 1.0% | +0.2pp | Ensure ARIA attributes are not prohibited for an element's role |
Assertion analysis (vs control)
Failure rates are computed per assertion (within each test case) and compared between the variant and control.
Most improved assertions
| Test case | Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|---|
| modal-dialog | Each modal dialog takes focus when opened | R | 80% | 2% | -78.4pp | 201 / 250 | 1 / 50 |
| modal-dialog | Each dialog can be closed by escape key | BP | 66% | 0% | -65.6pp | 164 / 250 | 0 / 50 |
| modal-dialog | Focus is not lost when each dialog closes | R | 64% | 0% | -64.4pp | 161 / 250 | 0 / 50 |
| modal-dialog | Each dialog has a dialog role | R | 59% | 0% | -59.2pp | 148 / 250 | 0 / 50 |
| modal-dialog | Each modal dialog traps keyboard focus | R | 59% | 0% | -59.2pp | 148 / 250 | 0 / 50 |
| shopping-home-page | Has a single maincontent | R | 63% | 8% | -54.8pp | 157 / 250 | 4 / 50 |
| simple-contact-form | Helper text is programmatically associated | R | 76% | 34% | -42.0pp | 190 / 250 | 17 / 50 |
| simple-contact-form | Inputs use appropriate autocomplete for purpose | R | 87% | 46% | -40.8pp | 217 / 250 | 23 / 50 |
| disclosure-widget | All examples have a valid semantics | R | 48% | 8% | -40.0pp | 120 / 250 | 4 / 50 |
| modal-dialog | Each modal dialog hides content behind it while open | R | 92% | 60% | -32.0pp | 230 / 250 | 30 / 50 |
Most regressed assertions
| Test case | Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|---|
| disclosure-widget | Collapsed content is hidden from assistive technology | R | 24% | 34% | +10.0pp | 60 / 250 | 17 / 50 |
All assertion deltas (per test case)
disclosure-widget
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Collapsed content is hidden from assistive technology | R | 24% | 34% | +10.0pp | 60 / 250 | 17 / 50 |
| All examples have a valid semantics | R | 48% | 8% | -40.0pp | 120 / 250 | 4 / 50 |
modal-dialog
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Each modal dialog hides content behind it while open | R | 92% | 60% | -32.0pp | 230 / 250 | 30 / 50 |
| Each modal dialog takes focus when opened | R | 80% | 2% | -78.4pp | 201 / 250 | 1 / 50 |
| Each dialog has a dialog role | R | 59% | 0% | -59.2pp | 148 / 250 | 0 / 50 |
| Each modal dialog traps keyboard focus | R | 59% | 0% | -59.2pp | 148 / 250 | 0 / 50 |
| Focus is not lost when each dialog closes | R | 64% | 0% | -64.4pp | 161 / 250 | 0 / 50 |
| Each dialog can be closed by escape key | BP | 66% | 0% | -65.6pp | 164 / 250 | 0 / 50 |
shopping-home-page
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Has a single maincontent | R | 63% | 8% | -54.8pp | 157 / 250 | 4 / 50 |
| Has a single banner | R | 4% | 4% | -0.4pp | 11 / 250 | 2 / 50 |
| Has single h1 | BP | 2% | 2% | +0.0pp | 5 / 250 | 1 / 50 |
| Has an h1 | R | 0% | 0% | -0.4pp | 1 / 250 | 0 / 50 |
| Has at least one h2 | R | 2% | 0% | -2.0pp | 5 / 250 | 0 / 50 |
| Has at least one navigation | R | 3% | 0% | -3.2pp | 8 / 250 | 0 / 50 |
| Has a single footer | R | 5% | 0% | -5.2pp | 13 / 250 | 0 / 50 |
simple-contact-form
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 87% | 46% | -40.8pp | 217 / 250 | 23 / 50 |
| Helper text is programmatically associated | R | 76% | 34% | -42.0pp | 190 / 250 | 17 / 50 |
| Required fields are indicated (visually and programmatically) | R | 28% | 4% | -24.4pp | 71 / 250 | 2 / 50 |
| Each text input has an accessible name | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Each text input has textbox role | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Placeholder text is programmatically defined as a property | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Text inputs are keyboard focusable | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Visible label is included in accessible name | R | 0% | 0% | -0.4pp | 1 / 250 | 0 / 50 |
1. Basic — overall Δ pass rate +38.8pp
Overall: Control 10% (n=1000) → Variant 49% (n=200). Avg WCAG failures/sample: 8.35 → 3.17 (Δ -5.17).
Most improved test cases
| Test case | Control pass rate | Variant pass rate | Δ pass rate | Δ avg WCAG failures |
|---|---|---|---|---|
| simple-contact-form | 10% | 60% | +50.4pp | -2.11 |
| modal-dialog | 6% | 52% | +45.6pp | -4.32 |
| shopping-home-page | 2% | 32% | +30.0pp | -13.65 |
| disclosure-widget | 23% | 52% | +29.2pp | -0.61 |
Most reduced axe WCAG rules
| Rule | Control rate | Variant rate | Δ rate | Description |
|---|---|---|---|---|
| color-contrast | 73.5% | 29.5% | -44.0pp | Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds |
| link-name | 3.0% | 0.0% | -3.0pp | Ensure links have discernible text |
| button-name | 1.5% | 0.0% | -1.5pp | Ensure buttons have discernible text |
| aria-hidden-focus | 1.0% | 0.0% | -1.0pp | Ensure aria-hidden elements are not focusable nor contain focusable elements |
| aria-prohibited-attr | 0.8% | 0.0% | -0.8pp | Ensure ARIA attributes are not prohibited for an element's role |
Most increased axe WCAG rules
| Rule | Control rate | Variant rate | Δ rate | Description |
|---|---|---|---|---|
| link-in-text-block | 0.0% | 0.5% | +0.5pp | Ensure links are distinguished from surrounding text in a way that does not rely on color |
| aria-required-children | 0.3% | 0.5% | +0.2pp | Ensure elements with an ARIA role that require child roles contain them |
Assertion analysis (vs control)
Failure rates are computed per assertion (within each test case) and compared between the variant and control.
Most improved assertions
| Test case | Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|---|
| modal-dialog | Each modal dialog takes focus when opened | R | 80% | 2% | -78.4pp | 201 / 250 | 1 / 50 |
| modal-dialog | Each dialog can be closed by escape key | BP | 66% | 0% | -65.6pp | 164 / 250 | 0 / 50 |
| modal-dialog | Focus is not lost when each dialog closes | R | 64% | 0% | -64.4pp | 161 / 250 | 0 / 50 |
| shopping-home-page | Has a single maincontent | R | 63% | 0% | -62.8pp | 157 / 250 | 0 / 50 |
| simple-contact-form | Helper text is programmatically associated | R | 76% | 16% | -60.0pp | 190 / 250 | 8 / 50 |
| modal-dialog | Each dialog has a dialog role | R | 59% | 0% | -59.2pp | 148 / 250 | 0 / 50 |
| modal-dialog | Each modal dialog traps keyboard focus | R | 59% | 2% | -57.2pp | 148 / 250 | 1 / 50 |
| simple-contact-form | Inputs use appropriate autocomplete for purpose | R | 87% | 36% | -50.8pp | 217 / 250 | 18 / 50 |
| modal-dialog | Each modal dialog hides content behind it while open | R | 92% | 42% | -50.0pp | 230 / 250 | 21 / 50 |
| disclosure-widget | All examples have a valid semantics | R | 48% | 8% | -40.0pp | 120 / 250 | 4 / 50 |
Most regressed assertions
| Test case | Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|---|
| disclosure-widget | Collapsed content is hidden from assistive technology | R | 24% | 32% | +8.0pp | 60 / 250 | 16 / 50 |
| shopping-home-page | Has an h1 | R | 0% | 6% | +5.6pp | 1 / 250 | 3 / 50 |
| shopping-home-page | Has single h1 | BP | 2% | 6% | +4.0pp | 5 / 250 | 3 / 50 |
| simple-contact-form | Visible label is included in accessible name | R | 0% | 2% | +1.6pp | 1 / 250 | 1 / 50 |
All assertion deltas (per test case)
disclosure-widget
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Collapsed content is hidden from assistive technology | R | 24% | 32% | +8.0pp | 60 / 250 | 16 / 50 |
| All examples have a valid semantics | R | 48% | 8% | -40.0pp | 120 / 250 | 4 / 50 |
modal-dialog
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Each modal dialog hides content behind it while open | R | 92% | 42% | -50.0pp | 230 / 250 | 21 / 50 |
| Each modal dialog traps keyboard focus | R | 59% | 2% | -57.2pp | 148 / 250 | 1 / 50 |
| Each modal dialog takes focus when opened | R | 80% | 2% | -78.4pp | 201 / 250 | 1 / 50 |
| Each dialog has a dialog role | R | 59% | 0% | -59.2pp | 148 / 250 | 0 / 50 |
| Focus is not lost when each dialog closes | R | 64% | 0% | -64.4pp | 161 / 250 | 0 / 50 |
| Each dialog can be closed by escape key | BP | 66% | 0% | -65.6pp | 164 / 250 | 0 / 50 |
shopping-home-page
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Has an h1 | R | 0% | 6% | +5.6pp | 1 / 250 | 3 / 50 |
| Has single h1 | BP | 2% | 6% | +4.0pp | 5 / 250 | 3 / 50 |
| Has a single banner | R | 4% | 4% | -0.4pp | 11 / 250 | 2 / 50 |
| Has at least one h2 | R | 2% | 0% | -2.0pp | 5 / 250 | 0 / 50 |
| Has at least one navigation | R | 3% | 0% | -3.2pp | 8 / 250 | 0 / 50 |
| Has a single footer | R | 5% | 0% | -5.2pp | 13 / 250 | 0 / 50 |
| Has a single maincontent | R | 63% | 0% | -62.8pp | 157 / 250 | 0 / 50 |
simple-contact-form
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 87% | 36% | -50.8pp | 217 / 250 | 18 / 50 |
| Helper text is programmatically associated | R | 76% | 16% | -60.0pp | 190 / 250 | 8 / 50 |
| Required fields are indicated (visually and programmatically) | R | 28% | 4% | -24.4pp | 71 / 250 | 2 / 50 |
| Visible label is included in accessible name | R | 0% | 2% | +1.6pp | 1 / 250 | 1 / 50 |
| Each text input has an accessible name | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Each text input has textbox role | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Placeholder text is programmatically defined as a property | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Text inputs are keyboard focusable | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
2. Detailed Instructions — overall Δ pass rate +51.3pp
Overall: Control 10% (n=1000) → Variant 62% (n=200). Avg WCAG failures/sample: 8.35 → 1.33 (Δ -7.01).
Most improved test cases
| Test case | Control pass rate | Variant pass rate | Δ pass rate | Δ avg WCAG failures |
|---|---|---|---|---|
| shopping-home-page | 2% | 70% | +68.0pp | -20.89 |
| disclosure-widget | 23% | 74% | +51.2pp | -0.99 |
| simple-contact-form | 10% | 60% | +50.4pp | -2.03 |
| modal-dialog | 6% | 42% | +35.6pp | -4.14 |
Most reduced axe WCAG rules
| Rule | Control rate | Variant rate | Δ rate | Description |
|---|---|---|---|---|
| color-contrast | 73.5% | 13.5% | -60.0pp | Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds |
| link-name | 3.0% | 0.0% | -3.0pp | Ensure links have discernible text |
| button-name | 1.5% | 0.0% | -1.5pp | Ensure buttons have discernible text |
| aria-prohibited-attr | 0.8% | 0.0% | -0.8pp | Ensure ARIA attributes are not prohibited for an element's role |
| aria-required-children | 0.3% | 0.0% | -0.3pp | Ensure elements with an ARIA role that require child roles contain them |
Most increased axe WCAG rules
| Rule | Control rate | Variant rate | Δ rate | Description |
|---|---|---|---|---|
| listitem | 0.0% | 0.5% | +0.5pp | Ensure <li> elements are used semantically |
| aria-hidden-focus | 1.0% | 1.5% | +0.5pp | Ensure aria-hidden elements are not focusable nor contain focusable elements |
| nested-interactive | 0.2% | 0.5% | +0.3pp | Ensure interactive controls are not nested as they are not always announced by screen readers or can cause focus problems for assistive technologies |
Assertion analysis (vs control)
Failure rates are computed per assertion (within each test case) and compared between the variant and control.
Most improved assertions
| Test case | Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|---|
| shopping-home-page | Has a single maincontent | R | 63% | 0% | -62.8pp | 157 / 250 | 0 / 50 |
| simple-contact-form | Helper text is programmatically associated | R | 76% | 14% | -62.0pp | 190 / 250 | 7 / 50 |
| modal-dialog | Focus is not lost when each dialog closes | R | 64% | 4% | -60.4pp | 161 / 250 | 2 / 50 |
| modal-dialog | Each dialog can be closed by escape key | BP | 66% | 6% | -59.6pp | 164 / 250 | 3 / 50 |
| modal-dialog | Each modal dialog takes focus when opened | R | 80% | 24% | -56.4pp | 201 / 250 | 12 / 50 |
| modal-dialog | Each dialog has a dialog role | R | 59% | 4% | -55.2pp | 148 / 250 | 2 / 50 |
| modal-dialog | Each modal dialog traps keyboard focus | R | 59% | 4% | -55.2pp | 148 / 250 | 2 / 50 |
| simple-contact-form | Inputs use appropriate autocomplete for purpose | R | 87% | 36% | -50.8pp | 217 / 250 | 18 / 50 |
| disclosure-widget | All examples have a valid semantics | R | 48% | 6% | -42.0pp | 120 / 250 | 3 / 50 |
| modal-dialog | Each modal dialog hides content behind it while open | R | 92% | 56% | -36.0pp | 230 / 250 | 28 / 50 |
All assertion deltas (per test case)
disclosure-widget
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Collapsed content is hidden from assistive technology | R | 24% | 20% | -4.0pp | 60 / 250 | 10 / 50 |
| All examples have a valid semantics | R | 48% | 6% | -42.0pp | 120 / 250 | 3 / 50 |
modal-dialog
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Each modal dialog hides content behind it while open | R | 92% | 56% | -36.0pp | 230 / 250 | 28 / 50 |
| Each modal dialog takes focus when opened | R | 80% | 24% | -56.4pp | 201 / 250 | 12 / 50 |
| Each dialog can be closed by escape key | BP | 66% | 6% | -59.6pp | 164 / 250 | 3 / 50 |
| Each dialog has a dialog role | R | 59% | 4% | -55.2pp | 148 / 250 | 2 / 50 |
| Each modal dialog traps keyboard focus | R | 59% | 4% | -55.2pp | 148 / 250 | 2 / 50 |
| Focus is not lost when each dialog closes | R | 64% | 4% | -60.4pp | 161 / 250 | 2 / 50 |
shopping-home-page
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Has a single banner | R | 4% | 4% | -0.4pp | 11 / 250 | 2 / 50 |
| Has an h1 | R | 0% | 0% | -0.4pp | 1 / 250 | 0 / 50 |
| Has at least one h2 | R | 2% | 0% | -2.0pp | 5 / 250 | 0 / 50 |
| Has single h1 | BP | 2% | 0% | -2.0pp | 5 / 250 | 0 / 50 |
| Has at least one navigation | R | 3% | 0% | -3.2pp | 8 / 250 | 0 / 50 |
| Has a single footer | R | 5% | 0% | -5.2pp | 13 / 250 | 0 / 50 |
| Has a single maincontent | R | 63% | 0% | -62.8pp | 157 / 250 | 0 / 50 |
simple-contact-form
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 87% | 36% | -50.8pp | 217 / 250 | 18 / 50 |
| Helper text is programmatically associated | R | 76% | 14% | -62.0pp | 190 / 250 | 7 / 50 |
| Each text input has an accessible name | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Each text input has textbox role | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Placeholder text is programmatically defined as a property | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Text inputs are keyboard focusable | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Visible label is included in accessible name | R | 0% | 0% | -0.4pp | 1 / 250 | 0 / 50 |
| Required fields are indicated (visually and programmatically) | R | 28% | 0% | -28.4pp | 71 / 250 | 0 / 50 |
Detailed Results
No samples match the current filters.
disclosure-widget
Prompt
- Generate an HTML file that demonstrates an expand/collapse widget. - Wrap each widget with a div that has an `example` class. - Give the container for controlled content a `details` class.
DeepSeek V3.2
— 0%
— 20%
— 0%
— 80%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Sample 0 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 6
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 2
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 6
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 8
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 8
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 2
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 2
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 8
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 8
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 6
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (DeepSeek V3.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 3
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 2
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail - elementHandle.$eval: Failed to find element matching selector ".details"
Sample 2 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Claude Haiku 4.5
— 0%
— 0%
— 0%
— 20%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Sample 0 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0084
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0078
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0075
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0088
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0082
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0075
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0096
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0091
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0076
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0085
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0079
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0081
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0092
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0092
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0084
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0096
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0083
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0089
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 10 | $0.0085
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (10) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0075
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0088
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0079
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0075
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0090
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0099
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 8 | $0.0125
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 10 | $0.0135
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail - elementHandle.$eval: Failed to find element matching selector ".details"
Axe Best Practice Issues (10) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 9 | $0.0132
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail - elementHandle.$eval: Failed to find element matching selector ".details"
Axe Best Practice Issues (9) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 10 | $0.0160
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (10) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 2 | BP: 8 | $0.0124
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 8 | $0.0088
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 10 | $0.0098
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (10) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0093
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 8 | $0.0092
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 10 | $0.0098
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0193
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 1 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0189
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 2 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 20.01s
Axe WCAG: 0 | $0.0208
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 3 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 15.36s
Axe WCAG: 0 | $0.0182
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 17.09s
Axe WCAG: 0 | $0.0179
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Claude Opus 4.6
— 0%
— 0%
— 0%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0298
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0341
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0337
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0325
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0328
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0315
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0306
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0329
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0342
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0316
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0346
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0316
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0341
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0297
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0353
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0295
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0340
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0298
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0297
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0335
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0342
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0298
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0323
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0301
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0302
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0557
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0630
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0609
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0557
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0611
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0422
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0392
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0415
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0373
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 2 | $0.0363
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0888
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0864
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 30.95s
Axe WCAG: 0 | $0.0832
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 30.97s
Axe WCAG: 0 | $0.0870
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 35.17s
Axe WCAG: 0 | $0.0908
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Claude Sonnet 4.5
— 0%
— 0%
— 100%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0344
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 3 | $0.0361
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0334
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0344
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0366
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 2 | $0.0325
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0335
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0325
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0341
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0330
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0370
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0334
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0332
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0351
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0342
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0403
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0366
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0346
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0364
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0341
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0346
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0394
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0377
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0355
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0402
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Sonnet 4.5)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0246
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Sonnet 4.5)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0231
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Sonnet 4.5)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0235
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Sonnet 4.5)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0273
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Sonnet 4.5)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0271
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 8 | $0.0254
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0312
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 8 | $0.0252
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 8 | $0.0243
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 8 | $0.0267
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0485
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0480
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 32.99s
Axe WCAG: 0 | $0.0514
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 30.91s
Axe WCAG: 0 | $0.0491
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 29.86s
Axe WCAG: 0 | $0.0506
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Gemini 3 Flash Preview
— 0%
— 100%
— 40%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 2
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 40% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 2
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 2
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Gemini 3.5 Pro Preview
— 36%
— 100%
— 100%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 9
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 36% | 92% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3.5 Pro Preview)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3.5 Pro Preview)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3.5 Pro Preview)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Gemini 3.5 Pro Preview)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Gemini 3.5 Pro Preview)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Gemini 3.5 Pro Preview)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Gemini 3.5 Pro Preview)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Gemini 3.5 Pro Preview)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Gemini 3.5 Pro Preview)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3.5 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3.5 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (Gemini 3.5 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3.5 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3.5 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 7
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (7) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3.5 Pro Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3.5 Pro Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3.5 Pro Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3.5 Pro Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3.5 Pro Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3.5 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (Gemini 3.5 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (Gemini 3.5 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (1) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3.5 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (Gemini 3.5 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
GPT-5 Mini
— 48%
— 100%
— 80%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 12
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 48% | 98% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 10
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (1) ❌
- (1x) - aria-required-children (critical): Ensure elements with an ARIA role that require child roles contain them
Sample 7 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (1) ❌
- (1x) - html-has-lang (serious): Ensure every HTML document has a lang attribute
Sample 10 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (1) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 1
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (1) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
Axe Best Practice Issues (1) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
Sample 12 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 4
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (1) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (1) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
Sample 17 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 9 | BP: 1
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (9) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
Sample 18 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (1) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 21 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (GPT-5 Mini)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 0 (GPT-5 Mini)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (GPT-5 Mini)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (GPT-5 Mini)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (GPT-5 Mini)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (1) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5 Mini)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 0 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
GPT-5.2
— 56%
— 100%
— 100%
— 40%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 14
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 56% | 99% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 2
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 40% | 100% | 100% |
Sample 0 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 1 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 2 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 5 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 6 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 7 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 8 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 10 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 12 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 13 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 14 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 15 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 16 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 17 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 18 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 19 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 20 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 21 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 22 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 23 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 24 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 0 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 0 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 0 (GPT-5.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (GPT-5.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (GPT-5.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
GPT-5.2 Codex
— 88%
— 100%
— 100%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 22
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 88% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 5 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 6 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 8 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 9 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 10 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 11 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 12 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 13 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 14 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 15 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 19 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 20 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 22 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 23 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 24 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 0 (GPT-5.2 Codex)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5.2 Codex)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5.2 Codex)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (GPT-5.2 Codex)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5.2 Codex)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Grok 4 Fast Non-Reasoning
— 0%
— 0%
— 0%
— 0%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 7
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (7) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 9
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (9) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 9
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (9) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 7
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (7) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 9
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (9) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 7
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (7) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 9
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (9) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 9
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (9) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 7
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (7) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 4
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (1) ❌
- (1x) - nested-interactive (serious): Ensure interactive controls are not nested as they are not always announced by screen readers or can cause focus problems for assistive technologies
Axe Best Practice Issues (4) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 9
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (9) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 4
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (1) ❌
- (1x) - nested-interactive (serious): Ensure interactive controls are not nested as they are not always announced by screen readers or can cause focus problems for assistive technologies
Axe Best Practice Issues (4) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 8
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 9
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (9) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 8
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (8) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 7
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (7) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 3
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (1) ❌
- (1x) - nested-interactive (serious): Ensure interactive controls are not nested as they are not always announced by screen readers or can cause focus problems for assistive technologies
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (4) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 1
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (1) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
Axe Best Practice Issues (1) ⚠️
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 3
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (1) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
modal-dialog
Prompt
Create an example of a modal dialog component. It is closed by default, and the button to open it has a `trigger` class.
DeepSeek V3.2
— 0%
— 0%
— 0%
— 0%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Sample 0 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 8
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 8
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 8
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 1 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 4 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 0 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 3 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 4 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Claude Haiku 4.5
— 0%
— 0%
— 100%
— 20%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Sample 0 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0090
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0088
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0090
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0091
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0088
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0087
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0093
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0089
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0087
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0090
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0091
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0089
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0090
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0087
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0084
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0083
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0088
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0088
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0089
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0089
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0088
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0087
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0088
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0086
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0089
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Haiku 4.5)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0107
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (Claude Haiku 4.5)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0108
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Haiku 4.5)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0109
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Haiku 4.5)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0108
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Haiku 4.5)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0132
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0102
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0100
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0105
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0100
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0100
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0196
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 1 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0193
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 13.80s
Axe WCAG: 0 | $0.0175
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 15.84s
Axe WCAG: 0 | $0.0203
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 4 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 16.50s
Axe WCAG: 0 | $0.0196
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Claude Opus 4.6
— 0%
— 0%
— 0%
— 80%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Sample 0 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0465
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 1 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0467
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0466
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 3 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0463
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 4 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0468
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail - locator.evaluate: AbortError: The user aborted a request.
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 5 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0465
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 6 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0481
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 7 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0479
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail - locator.evaluate: AbortError: The user aborted a request.
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 8 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0464
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 9 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0471
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 10 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0462
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 11 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0462
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 12 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0465
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 13 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0463
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 14 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0453
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 15 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0463
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 16 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0463
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 17 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0454
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 18 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0463
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 19 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0475
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail - locator.evaluate: AbortError: The user aborted a request.
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 20 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0465
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 21 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0465
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 22 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0480
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 23 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0465
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 24 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0462
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 0 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0574
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 1 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0632
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0570
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 3 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0586
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 4 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0643
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 0 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0568
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0571
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0568
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0567
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0549
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0882
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0913
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 2 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 31.56s
Axe WCAG: 0 | $0.0868
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 34.55s
Axe WCAG: 0 | $0.0929
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
FAIL | Latency 38.81s
Axe WCAG: 0 | $0.1023
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Claude Sonnet 4.5
— 0%
— 0%
— 0%
— 0%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Sample 0 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0266
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0257
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 6 | BP: 4 | $0.0261
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (6) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 6 | BP: 4 | $0.0264
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (6) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 6 | BP: 4 | $0.0262
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (6) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0256
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0264
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0267
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0264
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0266
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0266
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0263
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0255
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0266
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0265
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0270
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0255
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0267
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0266
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 6 | BP: 4 | $0.0260
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (6) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0257
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0256
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0259
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 6 | BP: 4 | $0.0256
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (6) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0273
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0341
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0334
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0341
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0348
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0344
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0311
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0321
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0306
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0330
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0324
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0632
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 1 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0636
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 35.30s
Axe WCAG: 0 | $0.0600
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 3 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 36.76s
Axe WCAG: 0 | $0.0634
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 4 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 37.90s
Axe WCAG: 0 | $0.0643
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Gemini 3 Flash Preview
— 0%
— 100%
— 100%
— 0%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Sample 0 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 2
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Gemini 3.5 Pro Preview
— 0%
— 80%
— 100%
— 40%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 2
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 40% | 100% | 100% |
Sample 0 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 8
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 8
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (8) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 2
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 2
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 19 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3.5 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (Gemini 3.5 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 2 (Gemini 3.5 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (Gemini 3.5 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (Gemini 3.5 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 0 (Gemini 3.5 Pro Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (Gemini 3.5 Pro Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (Gemini 3.5 Pro Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (Gemini 3.5 Pro Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (Gemini 3.5 Pro Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 0 (Gemini 3.5 Pro Preview)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 1 (Gemini 3.5 Pro Preview)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (Gemini 3.5 Pro Preview)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 3 (Gemini 3.5 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (Gemini 3.5 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
GPT-5 Mini
— 48%
— 40%
— 60%
— 80%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 12
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 48% | 98% | 100% |
Samples: 5 | Passes: 2
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 40% | 100% | 100% |
Samples: 5 | Passes: 3
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 60% | 100% | 100% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Sample 0 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe WCAG Failures (6) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 5 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 6 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 7 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 8 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail - locator.click: Timeout 30000ms exceeded. Call log: [2m - waiting for locator('.trigger').filter({ visible: true }).first()[22m [2m - locator resolved to <button type="button" class="trigger">Open Modal</button>[22m [2m - attempting click action[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div>…</div> from <div hidden="" id="modal" class="modal" role="dialog" aria-modal="true" aria-labelledby="modal-title" aria-describedby="modal-desc">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 20ms[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div hidden="" id="modal" class="modal" role="dialog" aria-modal="true" aria-labelledby="modal-title" aria-describedby="modal-desc">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 100ms[22m [2m 14 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div hidden="" id="modal" class="modal" role="dialog" aria-modal="true" aria-labelledby="modal-title" aria-describedby="modal-desc">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div>…</div> from <div hidden="" id="modal" class="modal" role="dialog" aria-modal="true" aria-labelledby="modal-title" aria-describedby="modal-desc">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div hidden="" id="modal" class="modal" role="dialog" aria-modal="true" aria-labelledby="modal-title" aria-describedby="modal-desc">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div hidden="" id="modal" class="modal" role="dialog" aria-modal="true" aria-labelledby="modal-title" aria-describedby="modal-desc">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div hidden="" id="modal" class="modal" role="dialog" aria-modal="true" aria-labelledby="modal-title" aria-describedby="modal-desc">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div>…</div> from <div hidden="" id="modal" class="modal" role="dialog" aria-modal="true" aria-labelledby="modal-title" aria-describedby="modal-desc">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m
- ❌: Each dialog can be closed by escape key (BP): fail - utils is not defined
- ❌: Each modal dialog traps keyboard focus (R): fail - utils is not defined
- ❌: Each modal dialog takes focus when opened (R): fail - utils is not defined
- ❌: Focus is not lost when each dialog closes (R): fail - utils is not defined
- ❌: Each modal dialog hides content behind it while open (R): fail - utils is not defined
Sample 9 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 10 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 11 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 12 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 14 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 15 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 16 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 17 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 18 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 19 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 21 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 22 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 23 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 24 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 0 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (GPT-5 Mini)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (GPT-5 Mini)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 2
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 0 (GPT-5 Mini)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (GPT-5 Mini)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (GPT-5 Mini)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (GPT-5 Mini)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (GPT-5 Mini)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 0 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 2 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe Best Practice Issues (2) ⚠️
- aria-dialog-name (serious): Ensure every ARIA dialog and alertdialog node has an accessible name (Best Practice - does not affect pass/fail)
- aria-dialog-name (serious): Ensure every ARIA dialog and alertdialog node has an accessible name (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
GPT-5.2
— 16%
— 100%
— 100%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 16% | 62% | 89% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 1 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 5 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 6 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 7 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 8 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 9 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 10 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 12 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 13 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 14 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 15 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 16 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 17 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 18 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 19 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 20 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 21 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 22 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 23 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 24 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 0 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 2 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 0 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 2 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 0 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 2 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
GPT-5.2 Codex
— 0%
— 0%
— 60%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 3
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 60% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 1 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 3 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 5 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 6 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 7 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 8 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 9 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 10 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 11 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 12 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 13 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 14 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 15 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 16 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (2) ⚠️
- aria-dialog-name (serious): Ensure every ARIA dialog and alertdialog node has an accessible name (Best Practice - does not affect pass/fail)
- aria-dialog-name (serious): Ensure every ARIA dialog and alertdialog node has an accessible name (Best Practice - does not affect pass/fail)
Sample 17 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 18 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 19 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 20 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 21 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 22 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 23 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 24 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 0 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 2 (GPT-5.2 Codex)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 3 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (GPT-5.2 Codex)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 0 (GPT-5.2 Codex)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5.2 Codex)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (GPT-5.2 Codex)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 3 (GPT-5.2 Codex)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 4 (GPT-5.2 Codex)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 0 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 2 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Grok 4 Fast Non-Reasoning
— 0%
— 40%
— 0%
— 0%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 2
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 40% | 100% | 100% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 12
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (12) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 8
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ❌: Each modal dialog traps keyboard focus (R): fail
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 6
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (6) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
shopping-home-page
Prompt
Generate an HTML file for the home page of a shopping website. The home page has several sections of content for popular items, deals, and blog posts.
DeepSeek V3.2
— 0%
— 0%
— 20%
— 0%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Sample 0 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 16
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (16) ❌
- (1x) - button-name (critical): Ensure buttons have discernible text
- (11x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Sample 1 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 24
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (24) ❌
- (17x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (7x) - link-name (serious): Ensure links have discernible text
Sample 2 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 17
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (17) ❌
- (1x) - button-name (critical): Ensure buttons have discernible text
- (12x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Sample 3 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 20
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (20) ❌
- (1x) - button-name (critical): Ensure buttons have discernible text
- (15x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Sample 4 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 25
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (25) ❌
- (22x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (3x) - link-name (serious): Ensure links have discernible text
Sample 5 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 28 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (28) ❌
- (25x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (3x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 18
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (18) ❌
- (2x) - button-name (critical): Ensure buttons have discernible text
- (12x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Sample 7 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 30 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (30) ❌
- (1x) - button-name (critical): Ensure buttons have discernible text
- (22x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (7x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 16
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (16) ❌
- (1x) - button-name (critical): Ensure buttons have discernible text
- (11x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Sample 9 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 13
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (13) ❌
- (2x) - button-name (critical): Ensure buttons have discernible text
- (7x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Sample 10 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 27
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (27) ❌
- (23x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Sample 11 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 33
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (33) ❌
- (1x) - button-name (critical): Ensure buttons have discernible text
- (28x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Sample 12 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 17
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (17) ❌
- (1x) - button-name (critical): Ensure buttons have discernible text
- (12x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Sample 13 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 33
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (33) ❌
- (28x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (5x) - link-name (serious): Ensure links have discernible text
Sample 14 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 34
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (34) ❌
- (1x) - button-name (critical): Ensure buttons have discernible text
- (29x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Sample 15 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 26
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (26) ❌
- (19x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (7x) - link-name (serious): Ensure links have discernible text
Sample 16 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 26
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (26) ❌
- (1x) - button-name (critical): Ensure buttons have discernible text
- (21x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Sample 17 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 29 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (29) ❌
- (22x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (7x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 49
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (49) ❌
- (45x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Sample 19 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 29 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (29) ❌
- (1x) - button-name (critical): Ensure buttons have discernible text
- (24x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 24 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (24) ❌
- (20x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 14
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (14) ❌
- (2x) - button-name (critical): Ensure buttons have discernible text
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Sample 22 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 18
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (18) ❌
- (18x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 23 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 47
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (47) ❌
- (42x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
- (1x) - list (serious): Ensure that lists are structured correctly
Sample 24 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 43
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (43) ❌
- (39x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Sample 0 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 6
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ❌
- ❌: Has an h1 (R): fail
- ❌: Has single h1 (BP): fail
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (1) ⚠️
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 2 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (DeepSeek V3.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 4 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 6
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 54
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (54) ❌
- (54x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 29
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (29) ❌
- (22x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (7x) - link-name (serious): Ensure links have discernible text
Sample 2 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 42
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (42) ❌
- (42x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 24
Assertions ✅
- ✅: Has an h1 (R): pass
- ❌: Has single h1 (BP): fail
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (24) ❌
- (24x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 22
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (22) ❌
- (1x) - button-name (critical): Ensure buttons have discernible text
- (17x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Sample 0 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 15
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (15) ❌
- (15x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 13
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (13) ❌
- (13x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 11
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (11) ❌
- (11x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 13
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (13) ❌
- (13x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 16
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (16) ❌
- (16x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Claude Haiku 4.5
— 0%
— 0%
— 0%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 23 | BP: 30 | $0.0248
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (23) ❌
- (23x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (30) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 31 | BP: 42 | $0.0248
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (31) ❌
- (31x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (42) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 20 | BP: 30 | $0.0252
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (20) ❌
- (20x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (30) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 23 | BP: 31 | $0.0256
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (23) ❌
- (23x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (31) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 22 | BP: 31 | $0.0241
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (22) ❌
- (22x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (31) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 25 | BP: 39 | $0.0235
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (25) ❌
- (25x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (39) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 19 | BP: 22 | $0.0224
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (19) ❌
- (19x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (22) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 19 | BP: 31 | $0.0234
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (19) ❌
- (19x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (31) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 26 | BP: 30 | $0.0250
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (26) ❌
- (26x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (30) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 16 | BP: 22 | $0.0242
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (16) ❌
- (16x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (22) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 28 | BP: 42 | $0.0244
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (28) ❌
- (28x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (42) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 26 | BP: 30 | $0.0238
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (26) ❌
- (26x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (30) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 19 | BP: 30 | $0.0242
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (19) ❌
- (19x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (30) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 24 | BP: 30 | $0.0255
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (24) ❌
- (24x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (30) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 19 | BP: 22 | $0.0225
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (19) ❌
- (19x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (22) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 18 | BP: 23 | $0.0225
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (18) ❌
- (18x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (23) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 21 | BP: 34 | $0.0219
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (21) ❌
- (21x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (34) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 23 | BP: 35 | $0.0215
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (23) ❌
- (23x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (35) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 19 | BP: 30 | $0.0244
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (19) ❌
- (19x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (30) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 19 | BP: 31 | $0.0247
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (19) ❌
- (19x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (31) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 19 | BP: 31 | $0.0239
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (19) ❌
- (19x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (31) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 33 | BP: 42 | $0.0261
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (33) ❌
- (33x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (42) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 23 | BP: 32 | $0.0227
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (23) ❌
- (23x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (32) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 26 | BP: 30 | $0.0256
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (26) ❌
- (26x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (30) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 33 | BP: 43 | $0.0265
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (33) ❌
- (33x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (43) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 22 | $0.0246
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (22) ❌
- (22x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 34 | BP: 1 | $0.0268
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (34) ❌
- (34x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 42 | BP: 1 | $0.0257
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (42) ❌
- (42x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 41 | BP: 2 | $0.0243
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (41) ❌
- (41x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 30 | BP: 1 | $0.0251
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (30) ❌
- (30x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 17 | $0.0220
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (17) ❌
- (17x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 23 | BP: 42 | $0.0239
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (23) ❌
- (23x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (42) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 19 | BP: 42 | $0.0231
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (19) ❌
- (19x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (42) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 27 | BP: 39 | $0.0243
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (27) ❌
- (27x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (39) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 14 | BP: 31 | $0.0225
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (14) ❌
- (14x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (31) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0425
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 1 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0434
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 2 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 40.25s
Axe WCAG: 0 | $0.0496
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 3 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 34.27s
Axe WCAG: 0 | $0.0432
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 4 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 36.20s
Axe WCAG: 0 | $0.0441
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Claude Opus 4.6
— 0%
— 0%
— 0%
— 40%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 2
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 40% | 100% | 100% |
Sample 0 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 62 | BP: 35 | $0.3180
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (62) ❌
- (62x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (35) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 63 | BP: 32 | $0.3162
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (63) ❌
- (63x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (32) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 64 | BP: 33 | $0.2938
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (64) ❌
- (64x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (33) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 67 | BP: 50 | $0.3127
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (67) ❌
- (67x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (50) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 79 | BP: 35 | $0.3041
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (79) ❌
- (79x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (35) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 80 | BP: 45 | $0.3579
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (80) ❌
- (80x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (45) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 65 | BP: 37 | $0.3639
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (65) ❌
- (65x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (37) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 79 | BP: 50 | $0.2973
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (79) ❌
- (79x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (50) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 62 | BP: 26 | $0.3162
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (62) ❌
- (62x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (26) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 44 | BP: 14 | $0.3520
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (44) ❌
- (44x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (14) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 53 | BP: 33 | $0.3433
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (53) ❌
- (53x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (33) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 65 | BP: 32 | $0.2893
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (65) ❌
- (65x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (32) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 54 | BP: 50 | $0.3344
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (54) ❌
- (54x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (50) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 76 | BP: 35 | $0.3298
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (76) ❌
- (76x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (35) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 71 | BP: 39 | $0.3623
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (71) ❌
- (71x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (39) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 75 | BP: 37 | $0.3745
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (75) ❌
- (75x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (37) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 54 | BP: 36 | $0.3358
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (54) ❌
- (54x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (36) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 70 | BP: 46 | $0.3118
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (70) ❌
- (70x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (46) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 63 | BP: 36 | $0.3682
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (63) ❌
- (63x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (36) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 78 | BP: 57 | $0.3319
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (78) ❌
- (78x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (57) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 52 | BP: 34 | $0.3414
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (52) ❌
- (52x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (34) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 51 | BP: 34 | $0.3434
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (51) ❌
- (51x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (34) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 47 | BP: 15 | $0.3627
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (47) ❌
- (47x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (15) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 71 | BP: 32 | $0.3137
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (71) ❌
- (71x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (32) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 72 | BP: 57 | $0.3231
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (72) ❌
- (72x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (57) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 16 | $0.4023
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (16) ❌
- (16x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 3 | $0.2930
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 16 | $0.3537
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (16) ❌
- (16x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 8 | $0.3592
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - link-in-text-block (serious): Ensure links are distinguished from surrounding text in a way that does not rely on color
Axe Best Practice Issues (8) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 4 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 8 | $0.3898
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (8) ❌
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 59 | BP: 1 | $0.3150
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (59) ❌
- (59x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 1 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 65 | $0.3330
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (65) ❌
- (65x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 39 | $0.2883
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (39) ❌
- (39x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 51 | $0.3145
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (51) ❌
- (51x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 42 | BP: 1 | $0.2746
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (42) ❌
- (1x) - aria-prohibited-attr (serious): Ensure ARIA attributes are not prohibited for an element's role
- (41x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 0 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 1 | $0.4017
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.3374
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 2 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
FAIL | Latency 133.60s
Axe WCAG: 5 | $0.3433
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 94.43s
Axe WCAG: 0 | $0.2545
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 4 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
FAIL | Latency 100.51s
Axe WCAG: 6 | $0.2790
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Claude Sonnet 4.5
— 0%
— 0%
— 0%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 26 | BP: 54 | $0.0929
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (26) ❌
- (26x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (54) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 36 | BP: 40 | $0.1042
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (36) ❌
- (36x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (40) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 38 | BP: 41 | $0.1045
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (38) ❌
- (38x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (41) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 34 | BP: 55 | $0.1067
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (34) ❌
- (34x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (55) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 23 | BP: 48 | $0.0959
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (23) ❌
- (23x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (48) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 23 | BP: 45 | $0.0990
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (23) ❌
- (23x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (45) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 43 | BP: 62 | $0.0950
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (43) ❌
- (43x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (62) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 23 | BP: 48 | $0.0990
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (23) ❌
- (23x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (48) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 43 | BP: 40 | $0.0982
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (43) ❌
- (43x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (40) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 20 | BP: 33 | $0.0952
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (20) ❌
- (20x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (33) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 31 | BP: 56 | $0.1072
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (31) ❌
- (31x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (56) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 51 | BP: 75 | $0.1149
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (51) ❌
- (51x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (75) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 28 | BP: 65 | $0.1021
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (28) ❌
- (28x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (65) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 20 | BP: 33 | $0.0969
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (20) ❌
- (20x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (33) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 24 | BP: 33 | $0.0996
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (24) ❌
- (24x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (33) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 24 | BP: 56 | $0.1020
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (24) ❌
- (24x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (56) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 62 | BP: 76 | $0.1246
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (62) ❌
- (62x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (76) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 26 | BP: 53 | $0.0987
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (26) ❌
- (26x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (53) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 23 | BP: 39 | $0.0984
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (23) ❌
- (23x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (39) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 53 | BP: 63 | $0.1175
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (53) ❌
- (53x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (63) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 33 | BP: 56 | $0.1026
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (33) ❌
- (33x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (56) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 26 | BP: 33 | $0.0966
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (26) ❌
- (26x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (33) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 37 | BP: 41 | $0.0990
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (37) ❌
- (37x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (41) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 41 | BP: 42 | $0.1092
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (41) ❌
- (41x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (42) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 24 | BP: 33 | $0.1013
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (24) ❌
- (24x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (33) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 36 | $0.1040
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (36) ❌
- (36x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 36 | $0.0938
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (36) ❌
- (36x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 30 | $0.0997
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (30) ❌
- (30x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 25 | $0.1013
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (25) ❌
- (25x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 36 | $0.1064
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (36) ❌
- (36x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 33 | $0.1097
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (33) ❌
- (33x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 30 | $0.1024
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (30) ❌
- (30x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 56 | $0.1059
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (56) ❌
- (56x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 28 | $0.1033
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (28) ❌
- (28x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 24 | $0.0955
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (24) ❌
- (24x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.1116
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 1 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.1164
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 2 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 64.05s
Axe WCAG: 0 | $0.1068
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 3 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 86.39s
Axe WCAG: 0 | $0.1468
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 4 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 73.32s
Axe WCAG: 0 | $0.1244
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Gemini 3 Flash Preview
— 4%
— 40%
— 80%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 4% | 20% | 40% |
Samples: 5 | Passes: 2
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 40% | 100% | 100% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 13 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (13) ❌
- (13x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 9 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (9) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (5x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 15 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (15) ❌
- (15x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 10 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (10) ❌
- (10x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 8
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (8) ❌
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 10 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Gemini 3 Flash Preview)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 12 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 14 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 8 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (8) ❌
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 5 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 19 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 11 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (11) ❌
- (11x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 5 | BP: 1
Assertions ❌
- ❌: Has an h1 (R): fail
- ❌: Has single h1 (BP): fail
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 5
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Gemini 3.5 Pro Preview
— 0%
— 0%
— 20%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 25 | BP: 45
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (25) ❌
- (19x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (6x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (45) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 18 | BP: 9
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (18) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (14x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (9) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 24 | BP: 45
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (24) ❌
- (24x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (45) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6 | BP: 46
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (46) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 11 | BP: 46
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ❌: Has at least one navigation (R): fail
- ✅: Has a single footer (R): pass
Axe WCAG Failures (11) ❌
- (11x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (46) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 8 | BP: 45
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (8) ❌
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (45) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 47
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (47) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 43 | BP: 11
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (43) ❌
- (29x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (14x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (11) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 19 | BP: 44
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (19) ❌
- (19x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (44) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 42
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (42) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 13 | BP: 37
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (13) ❌
- (4x) - button-name (critical): Ensure buttons have discernible text
- (9x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (37) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 5 | BP: 44
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (44) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6 | BP: 43
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (43) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 21 | BP: 41
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (21) ❌
- (8x) - button-name (critical): Ensure buttons have discernible text
- (9x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (41) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 13 | BP: 51
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (13) ❌
- (13x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (51) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 35 | BP: 53
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (35) ❌
- (35x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (53) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 21
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (21) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 10 | BP: 42
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (10) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (6x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (42) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 20
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (20) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 21 | BP: 32
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (21) ❌
- (21x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (32) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 16 | BP: 23
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (16) ❌
- (16x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (23) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 17 | BP: 35
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (17) ❌
- (17x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (35) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 10 | BP: 22
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (10) ❌
- (10x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (22) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 24 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (24) ❌
- (24x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3.5 Pro Preview)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 6 | BP: 1
Assertions ❌
- ❌: Has an h1 (R): fail
- ❌: Has single h1 (BP): fail
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3.5 Pro Preview)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 3
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Gemini 3.5 Pro Preview)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 15
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (15) ❌
- (15x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Gemini 3.5 Pro Preview)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 3
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Gemini 3.5 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 0 (Gemini 3.5 Pro Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 19
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (19) ❌
- (19x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Gemini 3.5 Pro Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Gemini 3.5 Pro Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 5
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Gemini 3.5 Pro Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 5
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Gemini 3.5 Pro Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (3x) - aria-prohibited-attr (serious): Ensure ARIA attributes are not prohibited for an element's role
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (Gemini 3.5 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 1 (Gemini 3.5 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 2 (Gemini 3.5 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 3 (Gemini 3.5 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 4 (Gemini 3.5 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
GPT-5 Mini
— 4%
— 0%
— 20%
— 40%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 4% | 20% | 40% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Samples: 5 | Passes: 2
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 40% | 100% | 100% |
Sample 0 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 32
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (32) ❌
- (1x) - aria-required-children (critical): Ensure elements with an ARIA role that require child roles contain them
- (31x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 16 | BP: 8
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ❌: Has at least one h2 (R): fail
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (16) ❌
- (16x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 19 | BP: 16
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (19) ❌
- (19x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (16) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 24 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (24) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
- (23x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 23 | BP: 6
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (23) ❌
- (23x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-contentinfo-is-top-level (moderate): Ensure the contentinfo landmark is at top level (Best Practice - does not affect pass/fail)
Sample 5 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 13 | BP: 6
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (13) ❌
- (13x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 6 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 7 | BP: 3
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (7) ❌
- (7x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (3) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-contentinfo-is-top-level (moderate): Ensure the contentinfo landmark is at top level (Best Practice - does not affect pass/fail)
Sample 7 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 11 | BP: 2
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (11) ❌
- (11x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 8 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 13
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (13) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-contentinfo-is-top-level (moderate): Ensure the contentinfo landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-main-is-top-level (moderate): Ensure the main landmark is at top level (Best Practice - does not affect pass/fail)
Sample 9 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 14 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ❌: Has at least one h2 (R): fail
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (14) ❌
- (1x) - aria-prohibited-attr (serious): Ensure ARIA attributes are not prohibited for an element's role
- (3x) - aria-required-children (critical): Ensure elements with an ARIA role that require child roles contain them
- (10x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 10 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 8 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (8) ❌
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-contentinfo-is-top-level (moderate): Ensure the contentinfo landmark is at top level (Best Practice - does not affect pass/fail)
Sample 11 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 10 | BP: 6
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (10) ❌
- (10x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 12 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 12 | BP: 6
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (12) ❌
- (12x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 13 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (3) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - select-name (critical): Ensure select element has an accessible name
Sample 14 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 9
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (9) ❌
- (9x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 15 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 12
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (12) ❌
- (12x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 16 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 16 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (16) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
- (15x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 17 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 1
Assertions ❌
- ❌: Has an h1 (R): fail
- ❌: Has single h1 (BP): fail
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 18 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 30 | BP: 3
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (30) ❌
- (30x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (3) ⚠️
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 19 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 21 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (21) ❌
- (21x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 20 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 14
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (14) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 21 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 10
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ❌: Has at least one h2 (R): fail
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-contentinfo-is-top-level (moderate): Ensure the contentinfo landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-contentinfo (moderate): Ensure the document has at most one contentinfo landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 22 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 9
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ❌: Has at least one h2 (R): fail
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (9) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 23 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6 | BP: 3
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (3) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 24 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 18 | BP: 6
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ❌: Has at least one h2 (R): fail
- ❌: Has a single banner (R): fail
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (18) ❌
- (18x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-main-is-top-level (moderate): Ensure the main landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-main (moderate): Ensure the document has at most one main landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5 Mini)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 15 | BP: 6
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (15) ❌
- (15x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5 Mini)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (1x) - aria-required-children (critical): Ensure elements with an ARIA role that require child roles contain them
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (GPT-5 Mini)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 11 | BP: 6
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (11) ❌
- (11x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5 Mini)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 5
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 0 (GPT-5 Mini)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 14 | BP: 16
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (14) ❌
- (14x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (16) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5 Mini)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 5 | BP: 3
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (3) ⚠️
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5 Mini)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 17 | BP: 6
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (17) ❌
- (17x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5 Mini)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 10
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (10) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5 Mini)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 16
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (16) ❌
- (16x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (3) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 2 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 10
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (10) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 5 | BP: 6
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (5x) - listitem (serious): Ensure <li> elements are used semantically
Axe Best Practice Issues (6) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 7
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - nested-interactive (serious): Ensure interactive controls are not nested as they are not always announced by screen readers or can cause focus problems for assistive technologies
Axe Best Practice Issues (7) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- label-title-only (serious): Ensure that every form element has a visible label and is not solely labeled using hidden labels, or the title or aria-describedby attributes (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
GPT-5.2
— 4%
— 60%
— 100%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 4% | 20% | 40% |
Samples: 5 | Passes: 3
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 60% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 30 | BP: 7
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (30) ❌
- (30x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (7) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 12 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (12) ❌
- (12x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 95 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (95) ❌
- (3x) - aria-prohibited-attr (serious): Ensure ARIA attributes are not prohibited for an element's role
- (92x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 20 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (20) ❌
- (20x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 5 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 96 | BP: 8
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (96) ❌
- (96x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 6 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 30 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ❌: Has single h1 (BP): fail
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (30) ❌
- (3x) - aria-prohibited-attr (serious): Ensure ARIA attributes are not prohibited for an element's role
- (27x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 7 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 67 | BP: 11
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (67) ❌
- (3x) - aria-prohibited-attr (serious): Ensure ARIA attributes are not prohibited for an element's role
- (64x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (11) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 8 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (4) ❌
- (4x) - aria-prohibited-attr (serious): Ensure ARIA attributes are not prohibited for an element's role
Axe Best Practice Issues (2) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 9 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 104 | BP: 2
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (104) ❌
- (2x) - aria-prohibited-attr (serious): Ensure ARIA attributes are not prohibited for an element's role
- (102x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 10 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe Best Practice Issues (4) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 11 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 43 | BP: 3
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (43) ❌
- (43x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (3) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 12 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 26 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (26) ❌
- (26x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 13 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 34 | BP: 15
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (34) ❌
- (4x) - aria-required-parent (critical): Ensure elements with an ARIA role that require parent roles are contained by them
- (30x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (15) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 14 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 137 | BP: 3
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (137) ❌
- (137x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (3) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 15 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 22 | BP: 17
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (22) ❌
- (22x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (17) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 16 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 108 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ❌: Has single h1 (BP): fail
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (108) ❌
- (108x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 17 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (1) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 18 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 29 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (29) ❌
- (3x) - aria-prohibited-attr (serious): Ensure ARIA attributes are not prohibited for an element's role
- (26x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 19 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 87 | BP: 2
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (87) ❌
- (87x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 20 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 27 | BP: 3
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (27) ❌
- (27x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (3) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 21 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 29 | BP: 10
Assertions ✅
- ✅: Has an h1 (R): pass
- ❌: Has single h1 (BP): fail
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (29) ❌
- (29x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 22 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 29 | BP: 10
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (29) ❌
- (29x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 23 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6 | BP: 2
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 24 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 33 | BP: 2
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (33) ❌
- (33x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 13
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (13) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 11
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (11) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 9
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (9) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 11
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (11) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 12
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (12) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 11
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (11) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 11
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (11) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 86 | BP: 11
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (86) ❌
- (86x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (11) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (4) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 61 | BP: 13
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (61) ❌
- (61x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (13) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 12
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (12) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-main-is-top-level (moderate): Ensure the main landmark is at top level (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 16
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (16) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (6) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 8
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (8) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 12
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (12) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
GPT-5.2 Codex
— 8%
— 40%
— 80%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 2
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 8% | 37% | 65% |
Samples: 5 | Passes: 2
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 40% | 100% | 100% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (1) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 13 | BP: 26
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ❌: Has at least one navigation (R): fail
- ✅: Has a single footer (R): pass
Axe WCAG Failures (13) ❌
- (13x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (26) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 7 | BP: 6
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (7) ❌
- (7x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 17 | BP: 8
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ❌: Has at least one navigation (R): fail
- ✅: Has a single footer (R): pass
Axe WCAG Failures (17) ❌
- (17x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 25 | BP: 2
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (25) ❌
- (25x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 22 | BP: 36
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (22) ❌
- (22x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (36) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (2) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6 | BP: 7
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (7) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 10 | BP: 16
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (10) ❌
- (10x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (16) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 9 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (9) ❌
- (9x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 9 | BP: 21
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (9) ❌
- (9x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (21) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 7 | BP: 3
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ❌: Has at least one navigation (R): fail
- ✅: Has a single footer (R): pass
Axe WCAG Failures (7) ❌
- (7x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (3) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 14
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ❌: Has at least one navigation (R): fail
- ✅: Has a single footer (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (14) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 7 | BP: 8
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (7) ❌
- (7x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 17 | BP: 7
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (17) ❌
- (17x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (7) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 10
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (10) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 6
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ❌: Has at least one navigation (R): fail
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (1) ❌
- (1x) - aria-prohibited-attr (serious): Ensure ARIA attributes are not prohibited for an element's role
Sample 19 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 3
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (3) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 5 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 8 | BP: 7
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (8) ❌
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (7) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 12 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ❌: Has at least one navigation (R): fail
- ✅: Has a single footer (R): pass
Axe WCAG Failures (12) ❌
- (12x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 8 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ❌: Has at least one navigation (R): fail
- ✅: Has a single footer (R): pass
Axe WCAG Failures (8) ❌
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 36
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (36) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 1 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (1) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 3 (GPT-5.2 Codex)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 3
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 0 (GPT-5.2 Codex)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 7 | BP: 11
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (7) ❌
- (7x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (11) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5.2 Codex)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 3
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (GPT-5.2 Codex)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 3 (GPT-5.2 Codex)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 10
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5.2 Codex)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 10
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (10) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 1 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 2 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 3 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 4 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Grok 4 Fast Non-Reasoning
— 0%
— 0%
— 0%
— 20%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 7 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (7) ❌
- (7x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 7 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (7) ❌
- (7x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ❌: Has single h1 (BP): fail
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 8 | BP: 6
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (8) ❌
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 10
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (10) ❌
- (10x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 13
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (13) ❌
- (13x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 10
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (10) ❌
- (10x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 14
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (14) ❌
- (14x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 3
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 8
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (8) ❌
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 15
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (15) ❌
- (15x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 7
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (7) ❌
- (7x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 13
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (13) ❌
- (13x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 15
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (15) ❌
- (15x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 28
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (28) ❌
- (28x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 29
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (29) ❌
- (29x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 3
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 3
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
simple-contact-form
Prompt
Create an HTML file with a contact form containing three text input fields: - Full Name (required field) - Email Address (required field, with helper text explaining the expected format) - Phone Number (optional field, with helper text about the preferred format) Wrap each field (label, input, and any helper text) in a div with class `form-field`. Include a submit button.
DeepSeek V3.2
— 0%
— 0%
— 0%
— 0%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Sample 0 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Please use format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Please use format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your first and last name.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your full legal name.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 8
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Please use format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Please use format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Enter your first and last name.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Claude Haiku 4.5
— 0%
— 0%
— 0%
— 0%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Sample 0 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0051
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0049
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0052
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0053
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0052
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0049
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0051
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0052
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0049
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0051
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0049
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0049
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0051
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0049
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0049
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0049
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0052
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0049
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0049
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0049
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0049
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0049
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0052
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0049
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0049
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0069
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `John Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 1 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0070
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `John Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 2 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0070
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `John Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 3 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0070
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `John Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 4 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0073
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `John Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 0 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0071
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `John Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0072
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `John Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0070
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `John Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0072
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `John Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0071
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `John Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0147
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `John Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 1 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0160
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `John Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 2 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 15.77s
Axe WCAG: 0 | $0.0206
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `John Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 3 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 12.35s
Axe WCAG: 0 | $0.0157
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `John Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 4 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 11.37s
Axe WCAG: 0 | $0.0153
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `John Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Claude Opus 4.6
— 0%
— 60%
— 100%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 3
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 60% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0161
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your email in the format: example@domain.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0161
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your email in the format: example@domain.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0175
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0159
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your email in the format: example@domain.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0175
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0175
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0172
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0175
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0175
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0175
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0175
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0159
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your email in the format: example@domain.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0161
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your email in the format: example@domain.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0175
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0175
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0175
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0172
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0175
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0175
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0161
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your email in the format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0175
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0161
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your email in the format: example@domain.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0175
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0159
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your email in the format: example@domain.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0161
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your email in the format: example@domain.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Opus 4.6)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0287
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Opus 4.6)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0287
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Opus 4.6)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0287
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Opus 4.6)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0294
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Opus 4.6)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0287
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0196
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0196
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Opus 4.6)
Instruction set: 0. Minimal
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0211
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Opus 4.6)
Instruction set: 0. Minimal
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0211
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Opus 4.6)
Instruction set: 0. Minimal
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0213
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0904
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0860
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 2 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 35.60s
Axe WCAG: 0 | $0.1000
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 43.27s
Axe WCAG: 0 | $0.1220
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 29.59s
Axe WCAG: 0 | $0.0883
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Claude Sonnet 4.5
— 0%
— 40%
— 60%
— 80%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 2
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 40% | 100% | 100% |
Samples: 5 | Passes: 3
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 60% | 100% | 100% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Sample 0 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0121
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0142
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0145
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0145
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0145
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0143
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0145
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0145
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0145
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0147
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0143
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0145
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0145
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0145
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0147
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0145
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0143
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0145
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0145
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0147
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0145
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0145
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0145
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0147
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0145
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Sonnet 4.5)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0194
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0190
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0190
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Sonnet 4.5)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0187
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Sonnet 4.5)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0191
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 0 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0186
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0198
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0198
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0181
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0196
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0487
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0494
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 2 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 26.29s
Axe WCAG: 0 | $0.0492
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 25.82s
Axe WCAG: 0 | $0.0497
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 25.50s
Axe WCAG: 0 | $0.0476
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Gemini 3 Flash Preview
— 0%
— 40%
— 100%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 2
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 40% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (555) 555-5555`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email format, such as name@example.com.` Found `Preferred format: (555) 000-0000.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (555) 555-5555`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (555) 000-0000.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address, such as name@example.com.` Found `Preferred format: (555) 000-0000. This field is optional.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (555) 000-0000`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (555) 000-0000`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (555) 000-0000. This field is optional.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (555) 000-0000`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (555) 000-0000`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (555) 000-0000`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890. This field is optional.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: +1 (555) 000-0000.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: +1 (555) 000-0000.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (555) 000-0000`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: (555) 555-5555.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Gemini 3.5 Pro Preview
— 0%
— 80%
— 100%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: 123-456-7890 (Optional)`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: 123-456-7890 (digits only or dashes).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: 123-456-7890 (Optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email (e.g., name@example.com).` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email (e.g., user@example.com).` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Gemini 3.5 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email (e.g., user@example.com).` Found `Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3.5 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (Gemini 3.5 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 2 (Gemini 3.5 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (Gemini 3.5 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (Gemini 3.5 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 0 (Gemini 3.5 Pro Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (Gemini 3.5 Pro Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 2 (Gemini 3.5 Pro Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (Gemini 3.5 Pro Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3.5 Pro Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 0 (Gemini 3.5 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (Gemini 3.5 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 2 (Gemini 3.5 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (Gemini 3.5 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (Gemini 3.5 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
GPT-5 Mini
— 20%
— 60%
— 40%
— 60%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 71% | 94% |
Samples: 5 | Passes: 3
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 60% | 100% | 100% |
Samples: 5 | Passes: 2
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 40% | 100% | 100% |
Samples: 5 | Passes: 3
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 60% | 100% | 100% |
Sample 0 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Jane Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Jane Doe Please enter your full name.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Jane Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `First and last name`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `This field is required.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ❌: Visible label is included in accessible name (R): fail
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `e.g., Jane Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Jane Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `e.g., Jane Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Jane Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (1) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
Sample 11 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Your full name`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 13 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Jane Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 15 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Jane Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 18 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `e.g., Alex Morgan`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Jane Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 20 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (1) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
Sample 21 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Jane Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 23 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (1) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
Sample 24 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 1 (GPT-5 Mini)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `First Last`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 2 (GPT-5 Mini)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (GPT-5 Mini)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ❌: Visible label is included in accessible name (R): fail
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 4 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 0 (GPT-5 Mini)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 1 (GPT-5 Mini)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 2 (GPT-5 Mini)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `First Last`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 3 (GPT-5 Mini)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 4 (GPT-5 Mini)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `e.g., Alex Morgan`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 0 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (2) ⚠️
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 4 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `First and last name`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
GPT-5.2
— 76%
— 100%
— 100%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 19
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 76% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Expected format: name@example.com` Found `Preferred format: +1 555 123 4567`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Use a valid email format, e.g., name@example.com.` Found `Preferred format: +1 (555) 123-4567.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 19 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 20 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Expected format: name@example.com` Found `Preferred format: +1 (555) 123-4567`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 2 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 4 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 0 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 2 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 0 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 2 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
GPT-5.2 Codex
— 0%
— 60%
— 100%
— 60%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 3
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 60% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 3
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 60% | 100% | 100% |
Sample 0 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: +1 (555) 123-4567`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Expected format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: +1 (555) 123-4567.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Expected format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email format, e.g., name@example.com.` Found `Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Expected format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Expected format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: +1 (555) 123-4567`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Expected format: name@example.com` Found `Preferred format: +1 (555) 123-4567`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Expected format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 2 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 0 (GPT-5.2 Codex)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5.2 Codex)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 2 (GPT-5.2 Codex)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (1) ⚠️
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5.2 Codex)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (GPT-5.2 Codex)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (1) ⚠️
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 2 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Grok 4 Fast Non-Reasoning
— 0%
— 0%
— 0%
— 0%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds